Introduction
For the aspiring or current IT professionals, this project will uncover salary insights on a wide variety of IT professions. The goal is to provide helpful decisions as to what profession to go or transition into, which programming language to know, which industry to go into, what size company to work in, and what German city to live in to make an average to above-average salary.
To explore this relationship, I will be exploring a survey dataset about German IT professionals. I will be taking a close look into what the higher paid professionals have in terms of programming knowledge, the company type and size they work at, the number of years of experience they have, what seniority level they are at, and which city they live in. The goal is to bring to light new information that isn’t clear at first glance and use it to help others make life-changing decisions.
Hence with my approach, I hope that my fellow aspiring or current IT professionals have a better understanding of what profession to go into or transition into.
Data Import and Preparation
Source of the data
IT Salary Survey for EU Region(2018-2020) (https://www.kaggle.com/datasets/parulpandey/2020-it-salary-survey-for-eu-region?resource=download) I sourced this data from the Kaggle public datasets. The survey was made and has been conducted by Sergey Vasilyev since 2015. The purpose of this data was to help discover salary patterns among the IT professionals in the EU region.
|
Age
|
Gender
|
City
|
Position
|
Total years of experience
|
Seniority level
|
Your main technology / programming language
|
Base_Salary
|
Сontract duration
|
Main language at work
|
Company size
|
Company type
|
|
26
|
Male
|
Munich
|
Software Engineer
|
5
|
Senior
|
TYPESCRIPT
|
80000
|
Unlimited contract
|
English
|
51-100
|
Product
|
|
26
|
Male
|
Berlin
|
Backend Developer
|
7
|
Senior
|
RUBY
|
80000
|
Unlimited contract
|
English
|
101-1000
|
Product
|
|
29
|
Male
|
Berlin
|
Software Engineer
|
12
|
Lead
|
JAVASCRIPT
|
120000
|
Temporary contract
|
English
|
101-1000
|
Product
|
|
28
|
Male
|
Berlin
|
Frontend Developer
|
4
|
Junior
|
JAVASCRIPT
|
54000
|
Unlimited contract
|
English
|
51-100
|
Startup
|
|
37
|
Male
|
Berlin
|
Backend Developer
|
17
|
Senior
|
C
|
62000
|
Unlimited contract
|
English
|
101-1000
|
Product
|
|
32
|
Male
|
Berlin
|
DevOps
|
5
|
Senior
|
AWS
|
76000
|
Unlimited contract
|
English
|
11-50
|
Startup
|
Variable names and definition: Age- age of surveyee, Gender- gender of surveyee, City- city the surveyee works in, Position- position the surveyee works as, Total years of experience- number of years of experience the surveyee has, Seniority level- surveyee’s seniority level, Base_Salary- surveyee’s base salary, Your main technology/programming language- the surveyee’s main programming language or technology used at work, Contract duration- surveyee’s period through which their contract is effective, Company size- number of people that work at the surveyee’s job, Company type- the industry the surveyee works in.
Exploratory Data Analysis And Visualization
I added three new variable: number_responses_per_city- the total amount of responses per city, number_people_per_position- the total number of people who work the same position, number_people_per_main_tech- the total number of people who use the same technology/programming language at work.
|
Age
|
Gender
|
City
|
Position
|
Total years of experience
|
Seniority level
|
Your main technology / programming language
|
Base_Salary
|
Сontract duration
|
Main language at work
|
Company size
|
Company type
|
number_responses_per_city
|
number_people_per_position
|
number_people_per_main_tech
|
|
26
|
Male
|
Munich
|
Software Engineer
|
5
|
Senior
|
TYPESCRIPT
|
80000
|
Unlimited contract
|
English
|
51-100
|
Product
|
236
|
388
|
31
|
|
26
|
Male
|
Berlin
|
Backend Developer
|
7
|
Senior
|
RUBY
|
80000
|
Unlimited contract
|
English
|
101-1000
|
Product
|
681
|
174
|
23
|
|
29
|
Male
|
Berlin
|
Software Engineer
|
12
|
Lead
|
JAVASCRIPT
|
120000
|
Temporary contract
|
English
|
101-1000
|
Product
|
681
|
388
|
116
|
|
28
|
Male
|
Berlin
|
Frontend Developer
|
4
|
Junior
|
JAVASCRIPT
|
54000
|
Unlimited contract
|
English
|
51-100
|
Startup
|
681
|
89
|
116
|
|
37
|
Male
|
Berlin
|
Backend Developer
|
17
|
Senior
|
C
|
62000
|
Unlimited contract
|
English
|
101-1000
|
Product
|
681
|
174
|
98
|
|
32
|
Male
|
Berlin
|
DevOps
|
5
|
Senior
|
AWS
|
76000
|
Unlimited contract
|
English
|
11-50
|
Startup
|
681
|
57
|
6
|
Upon exploring the dataset, I noticed that about half of the cities from which there were surveyee’s only had one response, some technologies/programming languages had very few people using them, two people said that they made an unrealistic amount of money, a few people didn’t give their gender, and two people put that their gender was diverse. In order to have an accurate representation of the majority of the surveyee’s, I filtered down the data to people who made less than a million euros, whose gender was male or female, the technologies/programming languages that had more than nine people using them, and the cities that had more than nine responses. This boiled down to only 949 responses of the original 1253.
Sex <- c("Male", "Female")
Salaries2 <- Salaries %>% filter(number_responses_per_city >= 10 & Base_Salary < 1000000 & Gender %in% Sex & number_people_per_main_tech > 9)
headsal2 <- head(Salaries2)
formattable(headsal2)
|
Age
|
Gender
|
City
|
Position
|
Total years of experience
|
Seniority level
|
Your main technology / programming language
|
Base_Salary
|
Сontract duration
|
Main language at work
|
Company size
|
Company type
|
number_responses_per_city
|
number_people_per_position
|
number_people_per_main_tech
|
|
26
|
Male
|
Munich
|
Software Engineer
|
5
|
Senior
|
TYPESCRIPT
|
80000
|
Unlimited contract
|
English
|
51-100
|
Product
|
236
|
388
|
31
|
|
26
|
Male
|
Berlin
|
Backend Developer
|
7
|
Senior
|
RUBY
|
80000
|
Unlimited contract
|
English
|
101-1000
|
Product
|
681
|
174
|
23
|
|
29
|
Male
|
Berlin
|
Software Engineer
|
12
|
Lead
|
JAVASCRIPT
|
120000
|
Temporary contract
|
English
|
101-1000
|
Product
|
681
|
388
|
116
|
|
28
|
Male
|
Berlin
|
Frontend Developer
|
4
|
Junior
|
JAVASCRIPT
|
54000
|
Unlimited contract
|
English
|
51-100
|
Startup
|
681
|
89
|
116
|
|
37
|
Male
|
Berlin
|
Backend Developer
|
17
|
Senior
|
C
|
62000
|
Unlimited contract
|
English
|
101-1000
|
Product
|
681
|
174
|
98
|
|
37
|
Male
|
Berlin
|
Frontend Developer
|
6
|
Middle
|
JAVASCRIPT
|
57000
|
Unlimited contract
|
English
|
11-50
|
Product
|
681
|
89
|
116
|
Age Distributions
In order to justify the cutting down of responses, I looked at the mean and median age of both the original and filtered down datasets.
The original datasets mean age is 32.51 while the median age is 32. The filtered down datasets mean age is 32.5 while the median age is 32. The mean and median ages for both datasets didn’t shift very much.
Gender Counts
In order to justify the cutting down of responses, I looked at the number of surveyee’s by gender of both the original and filtered down datasets.
|
Gender
|
n
|
|
Male
|
1049
|
|
Female
|
192
|
|
Diverse
|
2
|
|
Gender
|
n
|
|
Male
|
808
|
|
Female
|
141
|
About 77% of males and 73% females are left. Although a larger amount of males were cut out, but this is still an accurate representation of the original data because the proportion of males to females only slightly changed after the filtering.
City Counts
In order to justify the cutting down of responses, I looked at the number of surveyee’s per city of both the original and filtered down datasets.
|
City
|
n
|
|
Berlin
|
681
|
|
Munich
|
236
|
|
Frankfurt
|
44
|
|
Hamburg
|
40
|
|
Stuttgart
|
33
|
|
Cologne
|
20
|
|
Düsseldorf
|
15
|
|
Amsterdam
|
9
|
|
Karlsruhe
|
7
|
|
Nürnberg
|
7
|
|
City
|
n
|
|
Berlin
|
621
|
|
Munich
|
203
|
|
Frankfurt
|
37
|
|
Hamburg
|
36
|
|
Stuttgart
|
26
|
|
Cologne
|
16
|
|
Düsseldorf
|
10
|
After comparing the number of responses for the cities with the most responses, the number of responses per city kept were about 66% and up.
Position Counts
In order to justify the cutting down of responses, I looked at the number of surveyee’s per position of both the original and filtered down datasets.
|
Position
|
n
|
|
Software Engineer
|
388
|
|
Backend Developer
|
174
|
|
Data Scientist
|
110
|
|
Frontend Developer
|
89
|
|
QA Engineer
|
71
|
|
DevOps
|
57
|
|
Mobile Developer
|
53
|
|
ML Engineer
|
42
|
|
Product Manager
|
39
|
|
Data Engineer
|
28
|
|
Position
|
n
|
|
Software Engineer
|
314
|
|
Backend Developer
|
148
|
|
Frontend Developer
|
73
|
|
Data Scientist
|
71
|
|
QA Engineer
|
54
|
|
Mobile Developer
|
41
|
|
DevOps
|
36
|
|
Product Manager
|
33
|
|
ML Engineer
|
28
|
|
Data Engineer
|
23
|
After the filtering, we still have a good variety of responses per position. Our top ten positions remained the same between both datasets with some mild ordering change.
Experience Distribution
The original datasets mean number of years of experience is 8.76 years while the median is 8 years. The filtered down datasets mean number of years of experience is 8.86 years while the median is 8 years. The mean and median years of experience for both datasets didn’t shift very much and shows that we can move further with the new data.
Seniority Counts
|
Seniority level
|
n
|
|
Senior
|
565
|
|
Middle
|
366
|
|
Lead
|
166
|
|
Junior
|
79
|
|
Head
|
44
|
|
Entry level
|
4
|
|
Seniority level
|
n
|
|
Senior
|
444
|
|
Middle
|
265
|
|
Lead
|
135
|
|
Junior
|
47
|
|
Head
|
38
|
|
Principal
|
3
|
Looking at the number of surveyee’s per seniority level in both datasets, we do see a big difference, but every position has kept at least 60% of their responses after the filtering. The top 5 seniority levels remained constant in both datasets.
Main Language Counts
|
Main language at work
|
n
|
|
English
|
1024
|
|
German
|
186
|
|
Russian
|
15
|
|
Italian
|
3
|
|
Spanish
|
3
|
|
Main language at work
|
n
|
|
English
|
812
|
|
German
|
123
|
|
Russian
|
4
|
|
Deuglisch
|
1
|
After comparing the number of surveyee’s per main language used at work, English remained the most common language in the workplace. German and Russian came in second and third, respectively.
Main Tech Counts
|
Your main technology / programming language
|
n
|
|
PYTHON
|
228
|
|
JAVA
|
216
|
|
JAVASCRIPT
|
116
|
|
C
|
98
|
|
PHP
|
73
|
|
GO
|
32
|
|
TYPESCRIPT
|
31
|
|
SWIFT
|
30
|
|
SCALA
|
28
|
|
KOTLIN
|
27
|
|
Your main technology / programming language
|
n
|
|
JAVA
|
194
|
|
PYTHON
|
173
|
|
JAVASCRIPT
|
103
|
|
C
|
69
|
|
PHP
|
64
|
|
GO
|
28
|
|
SCALA
|
27
|
|
TYPESCRIPT
|
26
|
|
SWIFT
|
25
|
|
KOTLIN
|
23
|
Comparing the two datasets for the main technology/programming language used at work, our top ten programming languages remained the same even though the ordering changed. Python was originally the main used language, but in the filtered down dataset we have Java as the top one.
Salary Distributions
The mean salary in the original dataset is 71,655 euros while the median salary is 70,000 euros. The mean salary in the filtered dataset is 73,557 euros while the median salary is 70,000 euros. Both, the mean and median salaries, between the two datasets do not differ by much if by anything at all.
Experience Vs Salary
Gender
Which gender gets paid more? It turns out that males get paid more in general.
City
What city has a higher salary range? It looks like the highest paying cities are Berlin and Munich.
Technology/Programming Language
What technology/programming language should you learn? In order to receive a higher base salary, it looks like a person needs to know Python, Java, C, or Go.
Position
What position should I am to work as? It turns out that the highest paying positions are data scientist, software engineer, and backend developer.
Seniority Level
What seniority level should you strive for? In order to receive a higher base salary, a person should strive to work at some kind of management level such as a head, a lead, or a senior.
Company Size
What size company should you work at? It turns out that the company size a person should work at to receive a higher salary doesn’t really matter. Although, in general, someone should work at a company with 50+ people.
Company Type
What type of company should you work at? In order to receive a higher base salary, a person should try to work at a start up or a product company.
Summary
This analysis is intended to help aspiring or current IT professionals make a decision as to what profession to go or transition into, what tools are needed, what city to live in, what company size to work for, and what kind of company to work for. Hopefully, through data visualization, people can decide what profession to pursue.
Insights It turns out that in order to make a decent living in Germany, someone should work at a product or startup, they should know Go, Python, Java, or C, live in Berlin or Munich, work as a data scientist, software engineer, or backend developer, work at a company with 50+ people, and work their way up to some kind of management position. It also turns out that any person that knows English will be able to move to Germany and have no issue transitioning into a Germany-based company because English is widely used.
My analysis was limited by the small number of responses per city, the main programming language used at work, and gender. We know that there is a bias towards Berlin and Munich because these two cities had a much larger amount of surveyees than every other city. There is also a bias in which gender gets paid the most because there were a lot more male surveyees.
Next, I would like to use the previous years’ data and perform a time series analysis. I believe that this will allow me to have a larger pool of data to get an accurate representation of European IT professionals and their salaries, skills, and professions.